Researchers produce thousands of scholarly documents containing valuable technical knowledge. The community faces the laborious task of reading these documents to identify, extract, and synthesize information. To automate information gathering, document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge. Finetuning QA systems requires access to labeled data (tuples of context, question and answer). However, data curation for document QA is uniquely challenging because the context (i.e. answer evidence passage) needs to be retrieved from potentially long, ill-formatted documents. Existing QA datasets sidestep this challenge by providing short, well-defined contexts that are unrealistic in real-world applications. We present a three-stage document QA approach: (1) text extraction from PDF; (2) evidence retrieval from extracted texts to form well-posed contexts; (3) QA to extract knowledge from contexts to return high-quality answers -- extractive, abstractive, or Boolean. Using QASPER for evaluation, our detect-retrieve-comprehend (DRC) system achieves a +7.19 improvement in Answer-F1 over existing baselines while delivering superior context selection. Our results demonstrate that DRC holds tremendous promise as a flexible framework for practical scientific document QA.
translated by 谷歌翻译
我们提出了Clip-Lite,一种通过与文本注释的特征对齐方式进行视觉表示学习的信息有效方法。与先前提出的剪辑模型相比,剪辑液在优化其对比学学习目标期间只需要一个负图像文本样本对。我们通过利用信息有效的较低限制来实现这一点,以最大化两个输入模态之间的相互信息。这允许剪辑Lite培训,在获得比夹子的更好的性能的同时具有显着减少的数据和批量尺寸。我们通过在Coco-Tablions数据集上预先绘制来评估剪贴画并对其他数据集进行测试传输。 Clip-Lite在Pascal VOC分类上获得+ 15.4%的映射绝对增益,并在ImageNet上获得A + 22.1%的前1个精度增益,同时与其他更复杂,文本监督模型相当或优越。 Clip-Lite还优于剪辑图像和文本检索,零拍分类和视觉接地。最后,通过在表示学习期间执行显式图像文本对齐,我们显示Clip-Lite可以利用语言语义来鼓励可以在下游任务中使用的无偏见的视觉表示。
translated by 谷歌翻译
在这项工作中,我们提出了相互信息最大化知识蒸馏(MIMKD)。我们的方法使用对比目标来同时估计,并最大化教师和学生网络之间的本地和全球特征表示的相互信息的下限。我们通过广泛的实验证明,这可以通过将知识从更加性能但计算昂贵的模型转移来改善低容量模型的性能。这可用于产生更好的模型,可以在具有低计算资源的设备上运行。我们的方法灵活,我们可以将具有任意网络架构的教师蒸馏到任意学生网络。我们的经验结果表明,MIMKD优于各种学生教师对的竞争方法,具有不同的架构,以及学生网络的容量极低。我们能够通过从Reset-50蒸馏出来的知识,从基线精度为Shufflenetv2获得74.55%的精度。在Imagenet上,我们使用Reset-34教师网络将Reset-18网络从68.88%提高到70.32%的准确度(1.44%+)。
translated by 谷歌翻译
We propose two efficient approximations to standard convolutional neural networks: Binary-Weight-Networks and XNOR-Networks. In Binary-Weight-Networks, the filters are approximated with binary values resulting in 32× memory saving. In XNOR-Networks, both the filters and the input to convolutional layers are binary. XNOR-Networks approximate convolutions using primarily binary operations. This results in 58× faster convolutional operations (in terms of number of the high precision operations) and 32× memory savings. XNOR-Nets offer the possibility of running state-of-the-art networks on CPUs (rather than GPUs) in real-time. Our binary networks are simple, accurate, efficient, and work on challenging visual tasks. We evaluate our approach on the ImageNet classification task. The classification accuracy with a Binary-Weight-Network version of AlexNet is the same as the full-precision AlexNet. We compare our method with recent network binarization methods, BinaryConnect and BinaryNets, and outperform these methods by large margins on ImageNet, more than 16% in top-1 accuracy. Our code is available at: http://allenai.org/plato/xnornet.
translated by 谷歌翻译